Quickly generating billion-record synthetic databases
نویسندگان
چکیده
منابع مشابه
Generating Random Regular Graphs Quickly
There are various algorithms known for generating graphs with n vertices of given degrees uniformly at random. Unfortunately, none of them is of practical use for all degree sequences, even for those with all degrees equal. In this paper we examine an algorithm which, although it does not generate uniformly at random, is provably close to a uniform generator when the degrees are relatively smal...
متن کاملGenerating meaningful test databases
Testing is one of the most time-consuming and cost-intensive tasks in software development projects today. A recent report of the NIST [RTI02] estimated the costs for the economy of the Unites States of America caused by software errors in the year 2000 to range from $22.2 to $59.5 billion. Consequently, in the past few years, many techniques and tools have been developed to reduce the high tes...
متن کاملQuickly Generating Representative Samples from an RBM-Derived Process
Two learning algorithms were recently proposed – Herding and Fast Persistent Contrastive Divergence (FPCD) – which share the following interesting characteristic: they exploit changes in the model parameters while sampling in order to escape modes and mix better, during the sampling process that is part of the learning algorithm. We justify such approaches as ways to escape modes while approxim...
متن کاملRecord Linkage for Genealogical Databases
In this paper we describe past experience and outline current directions in performing record linkage over large genealogical databases. 1. INTRODUCTION AND MOTIVATION Record linkage is the problem of identifying multiple records that refer to the same real-world entity. In genealogical databases, it is the problem of identifying when individuals situated in different pedigrees refer to the sam...
متن کاملGenerating Databases for Query Workloads
To evaluate the performance of database applications and DBMSs, we usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which the users can control the characteristics of the resulting workload. Appl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM SIGMOD Record
سال: 1994
ISSN: 0163-5808
DOI: 10.1145/191843.191886